Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TDX] Added basic documentation to enable TDX in ChatQnA #1212

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

JakubLedworowski
Copy link

@JakubLedworowski JakubLedworowski commented Nov 28, 2024

Description

Confidential computing in AI in the cloud focuses on protecting sensitive data and computations from unauthorized access and tampering. It uses advanced security technologies, such as hardware-based isolation and encryption, to create secure environments where data and AI models can be processed safely. This ensures that even cloud service providers cannot access the data, providing a higher level of privacy and security. By leveraging confidential computing, organizations can confidently use AI in the cloud for tasks that involve sensitive information, such as healthcare data analysis or financial predictions, while complying with strict data protection regulations.

This change introduces the guide on protecting chosen microservices with Intel TDX technology:

  • added README_tdx.md
  • added chatqna_tdx.yaml that has all microservices configured with TDX-protection and default settings
  • described additional steps to run ChatQnA with custom setup

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

n/a

Tests

Manual tests with sample request enabling TDX in all ChatQnA services:
dataprep, embedding, llm, redis, reranking, retriever, tei, teirerank, tgi

### Kubelet Configuration

To run a complex and heavy application like OPEA, the cluster administrator must increase the kubelet timeout for container creation, otherwise the pod creation may fail due to timeout `Context deadline exceeded`.
This is required because the container creation process can take a long time due to the size of pod images and the need to download the AI models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this timeout change generally required for any k8s deployment? If so should this be added to the main k8s readme?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally required for all use cases where the Container creation takes long time. When TDX is involved, container creation time increases so much that it usually exceeds the default 2 minutes. It is described in k8s docs: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that just for peer pods or running CoCo on the host also often breaks 2 minutes?

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
> [!NOTE]
> Running TDX-protected services requires the user to define the pod's resources request (cpu, memory).
>
> Due to lack of hotplugging feature in TDX, the assigned resources cannot be changed after the pod is scheduled and the resources will not be shared with any other pod.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check hotplugging (TEE-specific? or kata-specific?)

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
>
> After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically.

All kubelet configuration options can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
Comment on lines 135 to 140
```bash
POD_NAME=$(kubectl get pods | grep 'chatqna-tgi' | awk '{print $1}')
kubectl get pod $POD_NAME -o jsonpath='{.spec.runtimeClassName}'
```

In the output you should see:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just show that it is running

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved
- added README_tdx.md
- described steps to run ChatQnA using helm and GMC

Signed-off-by: Jakub Ledworowski <[email protected]>
- Removed deployment option with helm
- Added sample chatqna_tdx.yaml
- Generalized description but left ChatQnA as an example

Signed-off-by: Jakub Ledworowski <[email protected]>
Signed-off-by: Jakub Ledworowski <[email protected]>
Signed-off-by: Jakub Ledworowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants